Goto

Collaborating Authors

 time-delay neural network


Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Natesan, Sanjay, Beigi, Homayoon

arXiv.org Artificial Intelligence

Large scale machine learning-based Raga identification continues to be a nontrivial issue in the computational aspects behind Carnatic music. Each raga consists of many unique and intrinsic melodic patterns that can be used to easily identify them from others. These ragas can also then be used to cluster songs within the same raga, as well as identify songs in other closely related ragas. In this case, the input sound is analyzed using a combination of steps including using a Discrete Fourier transformation and using Triangular Filtering to create custom bins of possible notes, extracting features from the presence of particular notes or lack thereof. Using a combination of Neural Networks including 1D Convolutional Neural Networks conventionally known as Time-Delay Neural Networks) and Long Short-Term Memory (LSTM), which are a form of Recurrent Neural Networks, the backbone of the classification strategy to build the model can be created. In addition, to help with variations in shruti, a long-time attention-based mechanism will be implemented to determine the relative changes in frequency rather than the absolute differences. This will provide a much more meaningful data point when training audio clips in different shrutis. To evaluate the accuracy of the classifier, a dataset of 676 recordings is used. The songs are distributed across the list of ragas. The goal of this program is to be able to effectively and efficiently label a much wider range of audio clips in more shrutis, ragas, and with more background noise.


Representation and Induction of Finite State Machines using Time-Delay Neural Networks

Neural Information Processing Systems

This work investigates the representational and inductive capabili(cid:173) ties of time-delay neural networks (TDNNs) in general, and of two subclasses of TDNN, those with delays only on the inputs (IDNN), and those which include delays on hidden units (HDNN) . Both ar(cid:173) chitectures are capable of representing the same class of languages, the definite memory machine (DMM) languages, but the delays on the hidden units in the HDNN helps it outperform the IDNN on problems composed of repeated features over short time windows.


Representation and Induction of Finite State Machines using Time-Delay Neural Networks

Clouse, Daniel S., Giles, C. Lee, Horne, Bill G., Cottrell, Garrison W.

Neural Information Processing Systems

This work investigates the representational and inductive capabilities of time-delay neural networks (TDNNs) in general, and of two subclasses of TDNN, those with delays only on the inputs (IDNN), and those which include delays on hidden units (HDNN). Both architectures are capable of representing the same class of languages, the definite memory machine (DMM) languages, but the delays on the hidden units in the HDNN helps it outperform the IDNN on problems composed of repeated features over short time windows.


Representation and Induction of Finite State Machines using Time-Delay Neural Networks

Clouse, Daniel S., Giles, C. Lee, Horne, Bill G., Cottrell, Garrison W.

Neural Information Processing Systems

This work investigates the representational and inductive capabilities of time-delay neural networks (TDNNs) in general, and of two subclasses of TDNN, those with delays only on the inputs (IDNN), and those which include delays on hidden units (HDNN). Both architectures are capable of representing the same class of languages, the definite memory machine (DMM) languages, but the delays on the hidden units in the HDNN helps it outperform the IDNN on problems composed of repeated features over short time windows.


Representation and Induction of Finite State Machines using Time-Delay Neural Networks

Clouse, Daniel S., Giles, C. Lee, Horne, Bill G., Cottrell, Garrison W.

Neural Information Processing Systems

This work investigates the representational and inductive capabilities oftime-delay neural networks (TDNNs) in general, and of two subclasses of TDNN, those with delays only on the inputs (IDNN), and those which include delays on hidden units (HDNN). Both architectures arecapable of representing the same class of languages, the definite memory machine (DMM) languages, but the delays on the hidden units in the HDNN helps it outperform the IDNN on problems composed of repeated features over short time windows. 1 Introduction In this paper we consider the representational and inductive capabilities of timedelay neuralnetworks (TDNN) [Waibel et al., 1989] [Lang et al., 1990], also known as NNFIR [Wan, 1993]. A TDNN is a feed-forward network in which the set of inputs to any node i may include the output from previous layers not only in the current time step t, but from d earlier time steps as well. The activation function 404 D.S. Clouse, C. L Giles, B. G. Home and G. W. Cottrell for node i at time t in such a network is given by equation 1: TDNNs have been used in speech recognition [Waibel et al., 1989], and time series prediction [Wan, 1993]. In this paper we concentrate on the language induction problem.


Connectionist Architectures for Multi-Speaker Phoneme Recognition

II, John B. Hampshire, Waibel, Alex

Neural Information Processing Systems

We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - to recognize the speech of one particular male speaker using internal models of other male speakers exclusively.


Connectionist Architectures for Multi-Speaker Phoneme Recognition

II, John B. Hampshire, Waibel, Alex

Neural Information Processing Systems

We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - to recognize the speech of one particular male speaker using internal models of other male speakers exclusively.


Connectionist Architectures for Multi-Speaker Phoneme Recognition

II, John B. Hampshire, Waibel, Alex

Neural Information Processing Systems

We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task.


Consonant Recognition by Modular Construction of Large Phonemic Time-Delay Neural Networks

Waibel, Alex

Neural Information Processing Systems

Encouraged by these results we wanted to explore the question, how we might expand on these models to make them useful for the design of speech recognition systems. A problem that emerges as we attempt to apply neural network models to the full speech recognition problem is the problem of scaling. Simply extending neural networks to ever larger structures and retraining them as one monolithic net quickly exceeds the capabilities of the fastest and largest supercomputers. The search complexity of finding a good solutions in a huge space of possible network configurations also soon assumes unmanageable proportions. Moreover, having to decide on all possible classes for recognition ahead of time as well as collecting sufficient data to train such a large monolithic network is impractical to say the least. In an effort to extend our models from small recognition tasks to large scale speech recognition systems, we must therefore explore modularity and incremental learning as design strategies to break up a large learning task into smaller subtasks. Breaking up a large task into subtasks to be tackled by individual black boxes interconnected in ad hoc arrangements, on the other hand, would mean to abandon one of the most attractive aspects of connectionism: the ability to perform complex constraint satisfaction in a massively parallel and interconnected fashion, in view of an overall optimal perfonnance goal.


Consonant Recognition by Modular Construction of Large Phonemic Time-Delay Neural Networks

Waibel, Alex

Neural Information Processing Systems

Encouraged by these results we wanted to explore the question, how we might expand on these models to make them useful for the design of speech recognition systems. A problem that emerges as we attempt to apply neural network models to the full speech recognition problem is the problem of scaling. Simply extending neural networks to ever larger structures and retraining them as one monolithic net quickly exceeds the capabilities of the fastest and largest supercomputers. The search complexity of finding a good solutions in a huge space of possible network configurations also soon assumes unmanageable proportions. Moreover, having to decide on all possible classes for recognition ahead of time as well as collecting sufficient data to train such a large monolithic network is impractical to say the least. In an effort to extend our models from small recognition tasks to large scale speech recognition systems, we must therefore explore modularity and incremental learning as design strategies to break up a large learning task into smaller subtasks. Breaking up a large task into subtasks to be tackled by individual black boxes interconnected in ad hoc arrangements, on the other hand, would mean to abandon one of the most attractive aspects of connectionism: the ability to perform complex constraint satisfaction in a massively parallel and interconnected fashion, in view of an overall optimal perfonnance goal.